Skip to content

feat: add EAGLE3 speculative decoding support#1

Closed
ichbinhandsome wants to merge 1 commit intomasterfrom
eagle3-adapt-new-arch
Closed

feat: add EAGLE3 speculative decoding support#1
ichbinhandsome wants to merge 1 commit intomasterfrom
eagle3-adapt-new-arch

Conversation

@ichbinhandsome
Copy link
Owner

EAGLE3 is an encoder-decoder based speculative decoding method:

  • Extracts features from target model at specific layers
  • Uses feature fusion layer to compress target features
  • Generates draft tokens with single-layer decoder
  • Maps draft vocabulary to target vocabulary via d2t tensor

Key changes:

  • Add LLM_ARCH_EAGLE3 architecture
  • Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp)
  • Add feature extraction from target model layers
  • Add g_embeddings handling for decoder input
  • Add GGML_TENSOR_FLAG_SYNC for GPU synchronization
  • Add --eagle3 flag for speculative-simple example
  • Add EAGLE3 model conversion in convert_hf_to_gguf.py

Make sure to read the contributing guidelines before submitting a PR

EAGLE3 is an encoder-decoder based speculative decoding method:
- Extracts features from target model at specific layers
- Uses feature fusion layer to compress target features
- Generates draft tokens with single-layer decoder
- Maps draft vocabulary to target vocabulary via d2t tensor

Key changes:
- Add LLM_ARCH_EAGLE3 architecture
- Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp)
- Add feature extraction from target model layers
- Add g_embeddings handling for decoder input
- Add GGML_TENSOR_FLAG_SYNC for GPU synchronization
- Add --eagle3 flag for speculative-simple example
- Add EAGLE3 model conversion in convert_hf_to_gguf.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant